Audio-visual speaker identification using coupled hidden Markov models
نویسندگان
چکیده
In this paper, we investigate the use of the coupled hidden Markov models (CHMM) for the task of audio-visual text dependent speaker identification. Our system determines the identity of the user from a temporal sequence of audio and visual observations obtained from the acoustic speech and the shape of the mouth, respectively. The multi modal observation sequences are then modeled using a set of CHMMs, one for each phoneme-viseme pair and for each person in the database. The use of CHMMs in our system is justified by the capacity of this model to describe the natural audio and visual state asynchrony as well as their conditional dependency over time. To train a CHMM we first train a speaker independent model using expectationmaximization (EM), and then we build a speaker dependent model using maximum a posteriori (MAP) training. Experimental results on XM2VTS database show that our system improves the accuracy of audio-only or video-only speaker identification at all levels of acoustic signal-to-noise ratio (SNR) from 0 to 30db.
منابع مشابه
Speaker Independent Speech Recognition Using Hidden Markov Models for Persian Isolated Words
متن کامل
Speaker Independent Speech Recognition Using Hidden Markov Models for Persian Isolated Words
متن کامل
Employing Second-Order Circular Suprasegmental Hidden Markov Models to Enhance Speaker Identification Performance in Shouted Talking Environments
Speaker identification performance is almost perfect in neutral talking environments. However, the performance is deteriorated significantly in shouted talking environments. This work is devoted to proposing, implementing, and evaluating new models called Second-Order Circular Suprasegmental Hidden Markov Models (CSPHMM2s) to alleviate the deteriorated performance in the shouted talking environ...
متن کاملBimodal speech recognition using coupled hidden Markov models
In this paper we present a bimodal speech recognition system in which the audio and visual modalities are modeled and integrated using coupled hidden Markov models (CHMMs). CHMMs are probabilistic inference graphs that have hidden Markov models as sub-graphs. Chains in the corresponding inference graph are coupled through matrices of conditional probabilities modeling temporal influences betwee...
متن کاملAudio-Visual Speaker Veri cation using Continuous Fused HMMs
This paper examines audio-visual speaker veri cation using a novel adaptation of fused hidden Markov models, in comparison to output fusion of individual classi ers in the audio and video modalities. A comparison of both hidden Markov model (HMM) and Gaussian mixture model (GMM) classi ers in both modalities under output fusion shows that the choice of audio classi er is more important than vid...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003